Generated using version 3.0 of the official AMS RTRX template 


Hybrid Data Assimilation without Ensemble Filtering 


Ricardo Todling* and Amal El Akkraoui^ 


Global Modeling and Assimilation Office, NASA/GSFC, Greenbelt, Maryland 


* Corresponding author address: Dr. Ricardo Todling, Global Modeling and Assimilation Office, 

NASA/GSFC, Code 610.1, Greenbelt, MD 20771. 

E-mail: ricardo.todling@nasa.gov 

1 Additional Affiliation: Science Systems and Applications, Inc., Lanham, MD 20706. 


1 


3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


ABSTRACT 


The Global Modeling and Assimilation Office is preparing to upgrade its three-dimensional 
variational system to a hybrid approach in which the ensemble is generated using a square- 
root ensemble Kalman filter (EnKF) and the variational problem is solved using the Grid- 
point Statistical Interpolation system. As in most EnKF applications, we found it necessary 
to employ a combination of multiplicative and additive inflations, to compensate for sampling 
and modeling errors, respectively and, to maintain the small-member ensemble solution 
close to the variational solution, we also found it necessary to re-center the members of 
the ensemble about the variational analysis. During tuning of the filter we have found re- 
centering and additive inflation to play a considerably larger role than expected, particularly 
in a dual-resolution context when the variational analysis is ran at larger resolution than the 
ensemble. This led us to consider a hybrid strategy in which the members of the ensemble 
are generated by simply converting the variational analysis to the resolution of the ensemble 
and applying additive inflation, thus bypassing the EnKF. Comparisons of this, so-called, 
filter- free hybrid procedure with an EnKF -based hybrid procedure and a control non-hybrid, 
traditional, scheme show both hybrid strategies to provide equality significant improvement 
over the control; more interestingly, the filter-free procedure was found to give qualitatively 
similar results to the EnKF-based procedure. 
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1. Introduction 


It is now generally accepted that a practical feasible way to introduce flow dependence in 
the background error covariances needed for either sequential or variational data assimilation 
procedures is to rely on an ensemble of short-range forecasts. Multiple works have now shown 
(Whitaker et al. 2008, Buehner et al. 2010, and Clayton et ah 2012) that combining the time- 
varying background error covariance derived from an ensemble of forecasts with the typical, 
stationary, climatological background error covariance leads to non-trivial improvements to 
the resulting, so-called, hybrid data assimilation system (Lorenc 2003). Most operational 
weather centers use three- or four-dimensional variational (3D/4DVar) techniques and have 
implemented hybrid approaches in these contexts. With the variational component capable 
of accepting hybrid formulations of its underlying background error covariance, what remains 
to be specified is a methodology to generate the required ensemble of forecasts. Presently, 
the Global Modeling and Assimilation Office, follows the National Centers for Environmental 
Predictions, and uses the square-root-based ensemble Kalman filter (EnKF; Whitaker et ah 
2008) for this purpose. The small number of ensemble members used in practice requires 
care to render adequate spread from the ensemble of forecasts to represent forecast errors. 
It is thus necessary to fiddle with the ensemble of analyses and: (i) apply multiplicative 
inflation to compensate for sampling errors; (ii) apply additive inflation to represent model 
uncertainties; and (iii) re-center the ensemble of analyses around the, hybrid, variational 
analysis to prevent possible divergence between the two assimilation systems. 

During the process of implementation and testing of the EnKF to provide initial condi- 
tions for the ensemble of forecasts for a hybrid strategy to be adopted for the Goddard Earth 
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Observing System (GEOS) atmospheric data assimilation system (ADAS), we have found 
steps (ii) and (iii) above to play a significant role in determining the behavior of the ensemble 
of forecasts. This is particularly noticeable when the ensemble and the (hybrid) variational 
analyses are produced at different resolutions in a, so-called, dual resolution approach. That 
re-centering and additive inflation are such key components of the hybrid strategy is illus- 
trated in Fig. 1, where the incremental contribution to the 500 hPa temperature field is 
shown for an arbitrarily selected member of the ensemble, at an arbitrarily selected time, 
after the EnKF has cycled beyond a spin up period. The panels in the figure correspond 
to increments at various stages in the ensemble analysis procedure: directly from the EnKF 
(top left), when only multiplicative inflation has been applied; when the EnKF increment 
is re-centered around the (hybrid) variational (higher resolution) analysis (top right); when 
applying additive inflation to the EnKF increment (bottom left); and when multiplicative 
inflation, additive inflation, and re-centering have been applied to form the total increment 
(bottom right). Re-centering is clearly a larger contributor to the total increment. Still, 
the main features in the increment obtained from the EnKF assimilation of observations are 
visibly identified after re-centering and additive inflation have taken place. At first, these 
results might suggest the EnKF to be poorly tuned, however, as we will show later, this is 
far from being the case. One key factor is that the EnKF analyses are at coarser resolution 
than the (hybrid) variational analysis used for re-centering; when the ensemble is at full 
resolution, the contribution from re-centering is much lesser (not shown). 

The crucial role played by steps (ii) and (iii) prompted us to investigate what would 
happen if we bypassed the EnKF step altogether. This led us to the, so-called, filter-free 
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ensemble scheme when ensemble analyses are generated by simply adding perturbations to 
the central, hybrid, variational analysis - that is, steps (ii) and (iii) are what constitute 
the ensemble analysis strategy. The additive perturbations used in this procedure corre- 
spond to samples of the scaled, 48-minus-24-hour forecast differences, similar to those used 
to generate the climatological background error covariance of the traditional assimilation 
approach; these are also the perturbations used when the EnKF is exercised. The remaining 
of this manuscript presents a comparison of results obtained from dual-resolution hybrid 
3DVar procedures when either the EnKF or the filter-free approach is used for the ensemble 
analysis generation. 


2. Brief overview and the filter- free strategy 

The basic idea of hybrid variational data assimilation is to use an ensemble of background 
fields to introduce instantaneous, flow-dependent, features to the traditionally non-evolving 
(static) background error covariance. In 3DVar this can be done by augmenting the control 
vector with an extra set of variables, usually referred to as alpha-control variables. The cost 
function of a hybrid incremental 3DVar system can be written as 

J(S z) = ^8z T [p 8 B s + p e T T (B e o L)T] _1 8z + ^(d - H8z) T K~\d - U8z ) , (1) 

where the control variable 8 z is a combined contribution from the n-vector solution <5x of the 
standard variational problem and a component that comes from an M-member ensemble, 
that is, 

M 

8z = (3 s 8x + PeT 1 ^ ot m o Aw e m . (2) 

m = 1 
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Here, the symbol o stands for the Hadamard-Schur (element-wise) product of two vectors, 
cx m is the m-th control vector related to the m-th ensemble member, and, using the symbol 
A to denote deviation from the mean, Aw^ = Kn - W )/VM — 1 is the m-th ensemble 
perturbation created from the m-th member background n w - vector state w b n , with respect to 
the ensemble mean w b . The formulation allows for the ensemble members to be of different 
(usually lower) resolution, than the primary n-vector control Ax, with the operator T 7 being 
responsible for resolution conversion. In (1), the matrices B s and B e stand for the static and 
ensemble background error covariances, respectively; the matrix L stands for a correlation 
matrix responsible for localization of the ensemble; the last term is the usual observation-fit 
term involving the observation error covariance matrix R, and the observation residual p- 
vector d = y — h(x 9 ) created from differencing the observation p- vector y with the projection 
of the first-guess state- vector x s onto observation space by the observation operator h, whose 
linearization is represented by the matrix H. The parameters (3 S and (3 e specify the interplay 
between the static and the ensemble background error covariances, respectively. The problem 
is reset to its traditional 3DVar configuration, with solution Ax, when f3 s = 1 and j3 e = 0. 
Details of the hybrid variational problem can be found in Hamill and Snyder (2000), Lorenc 
(2003) and Wang et al. (2007). 

The first hybrid implementation studied in the present work relies on the ensemble square- 
root Kalman filter formulation of Whitaker and Hamill (2002). Each 6-hours the ensemble 
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analysis updates the ensemble mean and its members through the sequence 


w 


Aw 


a 

m 


V 


3 = 1 

(3a) 

p 

Aw m - k 3^3 Sh m-,3 > 

(3b) 


3 = 1 

where yj is the j'-th observation, 5h m -j is the j-th element of the incremental factor <5h m = 
HAw m « h(w£j - h(w 6 ) resulting from the fact that observations are not perturbed in this 
formulation, and the n^-vector kj is the j'-th column of the gain matrix, K. and is given by 

1 M 

k J = ]j ^Zrj^ Aw m^ h m;j/^ 2 (4a) 

m— 1 

Aw m = Aw m 1 - k j'ljfihm-j (4b) 


for j = 1, , 2 . . . ,p, AwJJ, = AwJ,, and scalar coefficients a 2 and 7 j given by 


M 


o ' 2 


\r 1 


+ (a°f , 


m= 1 


li = !/ y/M- 1(1 + 0-7^-) 


( 5 ) 

( 6 ) 


Here only the diagonal elements (cr °) 2 = (R)^- of the observation error covariance are referred 
to, given that observation errors are assumed to be uncorrelated thus allowing observations 
to be processed serially (e.g., Houtekamer and Mitchell 2001); the algorithm above is a direct 
application of the expressions in Appendix II. E of Bierman (1977) for when the square-root 
of the background error covariance is made up of column vectors Aw^, for m = 1,2, , M. 
After all p observations are processed, Aw^ = Aw^, which is obtained by a backward recur- 
sion of (4b) from j — p to j — 1 to obtain (3b). Just as when solving the variational hybrid 
problem, localization is also needed and used in the square-root Kalman filter formulation 
of Whitaker and Hamill (2002), though it is left out of the equations above for the sake of 
notational simplicity. 
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The final ensemble of analyses, ultimately used to serve as initial conditions for the 
ensemble of forecasts, are typically re-centered around the variational analysis and inflated 
by scaled perturbations e m . That is, the m-th member final analysis is given by 

w m : = w“ + Tx a + /xe m , (7) 

where the parameter /i specifies the magnitude of the additive perturbation, and ideally, the 
operator T converting the high-resolution variational analysis onto the n w - dimensional space 
of the ensemble satisfies the relation TT^ = , though presently in our implementation 

this is not the case. Note that, in the application to GEOS ADAS, the operator T involves 
remapping of the central analysis to the topography of each member. Re-centering prevents 
the ensemble from steering far from the (hybrid) variational analyses, and additive inflation 
is one way of boosting error growth (e.g., Mitchell et al. 2002, Houtekamer et al. 2005, and 
Hamill and Whitaker 2005). 

The second hybrid strategy examined in the present work relies on the “filter-free” pro- 
cedure, constructed by simply replacing expression (7) with 

= Tx a + ae m , (8) 

completely removing the EnKF component from the cycle. By construction, the mean en- 
semble analysis equals the variational (hybrid) analysis, aside from differences in resolution. 
Notice that both strategies (7) and (8) employ the same additive perturbation e m , which in 
practice means pooling from the same database on 48-minus-24-hour forecast NMC-method- 
like differences. 
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3. GEOS ADAS 3DVar Ensemble Hybrid 


In GOES ADAS the variational problem of minimizing (1) is solved using the Grid- 
point Statistical Interpolation (GSI; Kleist et ah 2009a) analysis and the preconditioning 
formulation of (Derber and Rosati 1989). The static background error covariance matrix 
is implemented as a series of recursive filters producing nearly Gaussian and isotropic cor- 
relation functions following Wu et al. (2002), and tuned from GEOS forecasts (Wei Gu 
contribution in Rienecker et ah 2008); the hybrid background error covariance matrix uses 
an ensemble of GEOS background fields in a hybrid-capable GSI (David F. Parrish, personal 
communication). Satellite radiances are processed using the Community Radiative Transfer 
Model (CRTM; Kleespies et ah 2004) and the online variational bias-correction procedure of 
Derber and Wu (1998). A normal-mode-based balance constraint term following Kleist et ah 
(2009b) is applied to the static increment as well as to the ensemble part of the increment 
whenever the hybrid analysis is used. 

The ensemble hybrid-capable GEOS ADAS relies on the GEOS global atmospheric gen- 
eral circulation model (AGCM), developed at NASA/Goddard. The GEOS AGCM is built 
under the infrastructure of the Earth System Modeling Framework (ESMF; Collins et ah 
2005) and couples a cubed-sphere hydrodynamics (Putman and Tin 2007) with various 
physics packages including a modified version of the Relaxed Arakawa- Schubert convective 
parameterization scheme of Moorthi and Suarez (1992), the catchment-based hydrological 
model of Koster et ah (2000), the multi-layer snow model of Stieglitz et ah (2001), and 
the radiative transfer model of Chou and Suarez (1999), which uses interactive climatolog- 


ical aerosols from the Goddard Global Ozone Chemistry Aerosol Radiation and Transport 
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(GOCART; Collarco et al. 2010) package. 


In GEOS ADAS, assimilation is performed using the incremental analysis update (IAU) 
procedure of Bloom et al. (1996). A schematic representation of standard IAU appears in the 
top panel of Fig. 2. Considering for example the availability of observations around 00 UTC 
and of three-hourly AGCM background fields, the GSI analysis (purple boxes) produces an 
increment that is converted into a tendency and used to force a 6-hour (corrector) model 
integration (red triangles); this is followed by a 6-hour (predictor) integration period when 
the model is then set to run free from the analysis forcing as to produce backgrounds (green, 
upside-down, triangles) for the next assimilation cycle; the prediction period can be extended 
beyond 6-hours to complete, say, a 5-day forecast (horizontal orange-dashed lines). The cycle 
of running GSI and AGCM takes place whether GEOS ADAS is performing its traditional 
3DVar procedure or its hybrid extension. The only difference between these two options is 
that in the latter case, an ensemble of background fields is required for GSI to internally 
augment its background error covariance information, through (1). Hereafter, this cycle 
will be referred to as the central ADAS. It usually operates at a higher resolution than the 
ensemble ADAS (see below). 

Generation of the ensemble of background fields to make up the ensemble background 
error covariance B e involves AGCM integrations similar to those of the central ADAS, but 
generally carried at lower resolution. In turn, the ensemble of backgrounds requires an 
ensemble of “initial conditions” (analyses) to be available. At least three options exist within 
GEOS ADAS to generate an ensemble of analyses. The standard option follows Whitaker et 
al. (2008), as described earlier, and relies on the ensemble Kalman hlter (EnKF) software of 
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J. S. Whitaker, from NOAA/ESRL. This is the same software presently used in the NCEP 
operational global data assimilation system. Alternatively, one can generate an ensemble 
of GSI analyses, but this is considerably more computationally demanding than using the 
EnKF since it involves a complete variational analysis for each member of the ensemble. And 
lastly, an option to exercise the filter-free ensemble analysis is also available. Regardless of the 
ensemble of analyses scheme, once analyses are available, a corresponding set of background 
fields is generated through lAU-based AGCM integrations, similar to those of the central 
ADAS. The lAU-based ensemble procedure is illustrated in the bottom panel of Fig. 2. 
Availability of observations and an ensemble of backgrounds triggers one of the ensemble 
analysis options (EnAna; right-placed, purple boxes), including re-centering and additive 
inflation, generating an ensemble of analyses which are then turned into an ensemble of 
tendencies used to initialize the ensemble of AGCM integrations — forced during the first 
6 hours (light-red triangles), and unforced during the 6-hour background prediction period 
(light-green, upside-down triangles). 

There is a subtle difference to note related to how the GEOS ADAS IAU-based ensemble 
evolves its members when the EnKF is used versus when the filter-free strategy is used 
instead. With the EnKF, each member permanently cycles its corresponding set of initial 
conditions needed by the GEOS AGCM each cycle. With the filter-free strategy, the initial 
conditions for the ensemble of AGCM integrations are generated by simply converting the 
(high-resolution) initial conditions from the central (hybrid) cycle to the configuration of 
the ensemble; namely, at each cycle, all members start from the exact same set of initial 
conditions; the only thing making these integrations distinct is the corresponding IAU forcing 
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term used by each member, each derived from the ensemble analysis equation (8). 


4. Evaluation of hybrid strategies in GEOS ADAS 

In what follows, we present a discussion of results obtained for experiments from single 
analysis as well as fully cycled ADAS. Regular, non- hybrid, 3DVar results are compared with 
results from hybrid 3DVar analyses produced at 0.5-degree resolution on 72 vertical levels 
and relying on a 32-member, 1-degree, 72-level ensemble generated by either the EnKF or 
the filter-free procedure described above. 

a. Non-cycling hybrid analysis 

When an ensemble of backgrounds is used in a hybrid GSI analysis, one of the first 
things we examined was how the analysis increment changed with respect to its non-hybrid 
counterpart. Figure 3 provides an illustration for the change in analysis increment, measured 
in total energy units, for an analysis calculated at a single synoptic time using: (i) regular 
3DVar, with only the static background error covariance (left); (ii) 3DVar with a background 
error covariance matrix that is fully determined by the 32-member ensemble (center); and (iii) 
3DVar hybrid, when 50% of background error covariance matrix comes from the ensemble 
and the remaining 50% comes from its regular static background error covariance matrix 
(right). The ensemble-only case (center) shows considerably more activity in the tropics 
than when compared with the static-only case (left); the resulting hybrid (right) increment 
shows slight, but noticeable, energy increase in the mid-tropospheric and low-stratospheric 
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levels — a little less energy seems to be present along the Southern tropospheric jet in the 
ensemble (center) when compared to the static case (left), with the resulting hybrid retaining 
the energy in this region (right). 

Another aspect of relevance when introducing upgrading to hybrid analyses relates to 
how balance gets affected. In its 3DVar configuration, GSI has the capability of applying a 
tangent linear normal mode constraint (TLNMC) to the increment (see Kleist et al. 2009b). 
The constraint can be applied to either part of the increment (essentially to either of the 
two terms in eq. 2, or both; see Kleist 2012). Figure 4 shows two illustrations of the result 
of balancing the increment in various configurations of GSI. The panel on the left shows the 
total cost function during the iterations of the GSI minimization when using: traditional 
3DVar without TLNMC (black curve); traditional 3DVar with TLNMC (red curve); hybrid 
3DVar with TLNMC applied only to the static part of increment (green); and hybrid 3DVar 
when TLNMC is applied to the full increment. The behavior is typical of when adding 
constraints to the analysis, that is, with balance, the cost settles a little higher than when 
no constraint is applied. The hybrid minimization tends to reduce the cost when compared 
to the static-balanced configuration; particularly noticeable in the first outer minimization 
(first 100 iterations; compare green and blue curves with red curve, respectively). This 
is indication that the hybrid minimization recovers the fit to the observations somewhat 
deteriorated when the constraint is added to traditional 3DVar. 

The real measure of improved balance is displayed in the right panel of Fig. 4 where 
the spectra of the vertically integrated mass-wind divergence increment is shown for the 
same four configurations. The color scheme is preserved, and the curves show clearly that 
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TLNMC brings in considerable improvement in balance when applied to traditional 3DVar 
(compare black and red curves). It is also clear from the figure that applying TLNMC only 
to the static part of the increment when hybrid 3DVar is used is rather troublesome (green 
curve). This is natural since nothing guarantees the ensemble contribution to the increment, 
through its background error covariance matrix B e , to be balanced in any way; TLNMC 
must be applied to the full increment (blue curve) for balance to be acceptable in the hybrid 
configuration. However, this latter case is not completely perfect since some power in the 
spectrum still remains for large wave numbers which would best be reduced. As pointed 
out by Kleist (2012; see Figure 4.2 on page 108, in that work), this is a consequence of the 
dual-resolution aspect of the hybrid analysis and some aliasing of the winds. It is possible to 
use scale-dependent weights to reduce some of the aliasing issue (see Kleist 2012, Fig. 4.4, in 
that work), but this is part of future work. At present, the default in GEOS hybrid ADAS 
is to apply TLNMC to the full increment. 

The remaining illustrations in this section summarize results and comparisons from three 
experiments covering the month of April 2012. The abbreviations and brief explanation of 
each experiment follows: 

• Control (CTL): traditional 3DVar, similar to what is used by GMAO Operations, 
though experiments here are at, coarser, 0.5-degree resolution. 

• Hybrid (HY5): Dual-resolution hybrid ADAS using 50% static and 50% ensemble 
background error covariance contributions, with an ensemble of analyses generated by 
the EnKF. 

• Hybrid (HYA): similar to HY5, but using the filter- free procedure, that is, at each 
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cycle, an ensemble of analyses is generated by adding scaled NMC-like perturbations 
to the hybrid (central) variational analysis. 

Evaluation of results of these experiments examine familiar diagnostics: observation-minus- 
analysis (OMA), observation- minus-background (OMB), and observation-minus-forecast (OMF) 
residual statistics, monthly mean comparison with corresponding means from other numer- 
ical weather prediction (NWP) centers, and forecast skills scores. Additionally, ensemble- 
related diagnostics have also been examined to evaluate the performance of the ensemble 
itself. These included monthly- mean of the ensemble mean analyses and/or backgrounds, 
OMA, OMB and OMF residual statistics for the mean and ensemble members, and also time 
evolution of ensemble spread. Rank histograms (of say, OMB residuals) have been looked 
but we have found them to be rather difficult to interpret given the uncertainties associated 
with the observations (see Hamill 2001), therefore we refrain from discussing them here. 

b. About the ensemble itself 

We have seen in Fig. 1 how much re-centering and additive inflation participate to modify 
the analysis increments calculated by the EnKF. In addition to what was said earlier, we 
should point out that we have found re-centering and additive inflation to be necessary within 
the context of the small-size ensemble GEOS hybrid ADAS. Without re-centering the EnKF 
analyses were found to diverge from the central hybrid analysis; without additive inflation the 
ensemble was found to collapse rather quickly. Furthermore, finding the scaling parameter 
a multiplying the additive inflation term requires careful tuning. We have found a value of 
0.25 to be rather reasonable when the EnKF is used. This is considerably lower than value 
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of 0.40 presently used in the NCEP hybrid 3DVar (Daryl Kleist, pers. comm.). However, 
when using the filter-free approach, the value of 0.40 was found to be more adequate. 

In a cycling situation, the interplay between re-centering and inflation must lead to 
reasonable forecast spread. Figure 5 illustrates the time evolution of the global (largely tro- 
pospheric) spread of a 32-member ensemble for typical experiments performed with GEOS 
hybrid ADAS. The panel on the left uses the EnKF for its ensemble analysis and shows how 
the initial spread (blue curve) changes as the members evolve within the 9-hour background 
period (green, red, and black for the 3-, 6- and 9-hour backgrounds, respectively). The 
resulting hybrid ADAS performs rather well (see below), even when there is not much error 
growth within the 9-hour background period — note the green, red and black curves are very 
close to each other; however, the growth of error is consistent within the same period, with 
the smallest error seen in the 3-hr background and the largest in the 9-hour background. 
The panel on the right shows similar forecast spread for various times within the background 
period, but now when the filter-free approach is used to generate the ensemble of analyses. 
The initial spread is zero by construction (blue curve); the overall error growth is smaller 
than when the EnKF is used, and the error growth for within the 6-hour background period 
is now considerably larger. However, as we will see shortly, even with this difference in fore- 
cast spread within the 6-hour background period, the end result between the two ensemble 
generation procedures is very similar to the corresponding hybrid ADAS performing rather 
closely. 
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c. Evaluation with respect to observations 


Figure 6 shows vertical profiles of monthly averaged zonal wind (top) and temperature 
(bottom) radiosonde OMB residuals over three regions of the globe, namely, Northern Hemi- 
sphere (NH; left), tropics (center), and Southern Hemisphere (SH; right). Two hybrid ex- 
periments, one using the EnKF (HY5, red) and another using the filter- free scheme (HYA, 
green), are compared to the traditional 3DVar control experiment (CTL, blue). The only 
noticeable differences are in the tropics and SH for zonal winds, where the hybrid experi- 
ments show reduced biases with respect to the control; the EnKF and simplified (filter-free) 
scheme are rather comparable to each other. Results for temperature remain rather neutral. 
Examination of standard deviation of the OMB residuals for both winds and temperature 
indicate negligible differences among all three experiments (not shown). 

It is also possible to examine the impact of observations on the analysis following Todling 
(2013). This is an observation-space approach that uses the inverse of the observation 
error variances to define a measure for evaluating the contribution of various observing 
systems to the cycling assimilation. Fig. 7 displays impact results for the three experiments 
under consideration: control (black), EnKF-based hybrid (cyan), and filter-free-based hybrid 
(magenta). Regardless of the underlying analysis procedure, all three experiments show 
aircraft, radiosondes, and Aqua AIRS as the dominating observing systems in GEOS ADAS. 
These observing systems tend to display smaller impact when the cycling analysis is based 
on a hybrid approach as compared to traditional 3DVar — the hybrid strategies seem to rely 
slightly more on these observing systems than does traditional 3DVar. 

Figure 8 shows vertical profiles of standard deviations, calculated over the month of April 
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2012, for zonal wind radiosonde OMF residuals of the 24 hour forecasts. Though rather small, 
the benefit of using a hybrid assimilation strategy shows in both the tropics and Southern 
Hemisphere. Again here, the difference between the EnKF-based system and that using the 
filter-free configuration is very small, with some advantage shown for the latter in the SH. 

d. Evaluation with respect to independent analysis 

We routinely compare monthly mean analyses with those from other NWP centers. Fig- 
ure 9 shows the differences of the April 2012 zonally-averaged zonal wind for each experiment 
with the corresponding ECMWF operational analysis. Panels in the figure are differences for 
the control (CTL, top left), the filter-free hybrid scheme (HYA, top right), and the EnKF- 
based hybrid (HY5, bottom left). Compared with the control, both hybrid procedures ob- 
tains monthly mean analysis considerably closer to ECMWF’s monthly mean analysis; this is 
especially noticeable in the tropics. The bottom-right panel shows the monthly mean of the 
ensemble mean EnKF analysis (from HY5) difference with ECMWF operational analysis. 
Comparing this result with, say, that in the bottom-left panel, illustrates the behavior and 
reliability of the underlying EnKF ensemble analyses, though in the presence of re-centering 
it serves mainly as a sanity check to show that inflation averages away. 

e. Evaluation with respect to self analysis 

Lastly, we show some results when comparing forecasts from each of the three experiments 
with their own respective analyses. Figure 10 displays the zonally-averaged wind RMS error 
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of the 24 hour forecast, as a function of pressure, for three regions of interest. Results are 
for the three experiments under consideration: control (blue), and the two EnKF (HY5, 
red) and filter-free (HYA, green) hybrid strategies. Both hybrid strategies yield the same 
improvement in RMS error in the Northern and Southern Hemispheres, but result in some 
deterioration in Tropical mid-troposphere, with the filter-free procedure being less damaging 
than the EnKF. This behavior is opposite to that seen when examining both the monthly 
mean analyses and mean OMB radiosonde residuals, in which hybrid strategies amounted to 
improvement over traditional 3DVar. This remains an issue to tackle in future studies with 
GEOS Hybrid ADAS. 

In many ways, successful procedures must amount to improvement in the 500 hPa geopo- 
tential height anomaly correlations. Self-analysis evaluation results appear in Fig. 11 for 
5-day forecasts in both Northern (top-right) and Southern Hemisphere (top- left). Curves for 
the control experiment are in blue, those for the EnKF-based hybrid are in red, and those for 
the filter-free strategy are in green. The corresponding statistical significance curves appear 
at the bottom panels. The NH scores are pretty much neutral, but those in the SH show 
significant benefit from hybrid assimilation (bottom-left shows red and green curves outside 
and above significance boxes). Both hybrid strategies bring comparable and non-negligiblc 
improvements up to 5 days in their forecasts. We must stress the word comparable, as we see 
the filter-free procedure amounting to rather indistinguishable performance from a system 
using the EnKF to generate the ensemble of analyses. 


18 


366 

367 

368 

369 

370 

371 

372 

373 

374 

375 

376 

377 

378 

379 

380 

381 

382 

383 

384 

385 

386 

387 


5. Closing remarks 


In the process of implementing a 3DVar hybrid strategy for the Goddard Earth Ob- 
serving System (GEOS) atmospheric data assimilation system (ADAS) using the ensemble 
Kalman filter (EnKF) of Whitaker and Harnill (2002), under a dual resolution approach, 
we have found re-centering and additive inflation to play a fundamental role in determining 
the behavior of the ensemble. Examination of some preliminary results led us to consider 
generating the ensemble by simply adding NMC-method-like perturbations to the central 
(hybrid) variational analysis at each cycle, thus completely bypassing the EnKF. This so- 
called hlter-free procedure was put to the same evaluation test suite as that used to examine 
the quality of our EnKF-based 3DVar hybrid implementation. Both schemes are shown to 
perform rather similarly, bringing statistically significant improvements to GEOS ADAS. In- 
deed, the improvements to GEOS ADAS due to hybridization are comparable in magnitude 
to those seen at NCEP when upgrading its 3DVar system to a hybrid strategy, around May 
2012. The successful evaluation of the hlter-free approach is encouraging since one of its 
main advantages relates to not having to maintain two considerably different analysis sys- 
tems, namely, one to perform the EnKF and another to perform the 3DVar hybrid analysis 
(the Grid-point Statistical Interpolation analysis, in the present case). Though not the main 
driving motivation for this work, it is also important to stress the computational advantages 
of the hlter-free approach over the EnKF, or any alternative ensemble hlter scheme, since 
the hlter-free scheme does not explicitly analyze the members of the ensemble. 

At this point, we can only attempt to speculate on the reasons why the EnKF and hlter- 
free procedures perform so similarly. Factors that are likely to contribute to this are the small 
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388 size of the ensemble, and the dual resolution aspect of the GEOS ADAS implementation. 

389 Future tests are planned to accurately evaluate the role solely due to the resolution interplay. 

390 Further tests are also planned to look at the role played by the size of the ensemble, though 

391 we expect these to be harder to accurately provide conclusive results since they may require 

392 too large an ensemble to possibly afford in real applications such as the ones presented here. 
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additive inflation to the member analysis with a coefficient of 0.25 (bottom 
left); and resulting increment after both re-centering and additive inflation 
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dard 3DVar (left), a 3DVar when the background error covariances are fully 
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4 The panel on the left shows the total cost function as it changes during the 

iterations of the GSI minimization; all cases are calculated for the same synop- 
tic time but GSI is configured as follows: static (non-hybrid) 3DVar without 
balance constraint (black curve); (non- hybrid) 3DVar with TLNMC balance 
constraint (red curve); hybrid 3DVar without balance constraint applied to 
hybrid part of increment (green curve); and hybrid 3DVar with balance con- 
straint applied to full increment (blue curve). The panel on the right shows 
the integrated mass-wind divergence spectra of the analysis increment as a 
function of wave number for the same four configurations; color scheme of 
curves is as in panel on the left. 33 

5 Global spread of a 32-member ensemble measured in total energy units (J/kg); 
when EnKF is used to generate ensemble (top), and when filter-free ensemble 
scheme is used instead (bottom). The curves are for: analysis spread before 
re-centering and inflation (blue); 3-, 6- and 9-hour backgrounds (green, red, 

and black respectively). Totals exclude levels roughly above 10 hPa. 34 

6 Regionally- averaged, monthly mean of radiosonde OMB residuals of zonal 

wind (top) and temperature (bottom) for three experiments: control (blue), 
EnKF-based hybrid (red), and filter-free hybrid (green), shown for: Northern 
Hemisphere (left column), tropics (center column), and Southern Hemisphere 
(right column). 35 
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7 Observation impact on the analysis for three 3DVar experiments: control, 
non- hybrid (black bars); hybrid using EnKF (cyan bars); and hybrid using 
simplified, filter-free approach (magenta bars). In addition to the observa- 
tion types shown, all experiments use GPS radio occupation, but results are 
not shown here due to a little glitch in the output hies saving their corre- 
sponding information (basically, GPS impacts are of the magnitude of those 

of radiosondes, and are comparable among the difference analysis approaches). 36 

8 Similar to Fig. 6, but for standard deviation. Only zonal winds are shown 

since temperature have neutral results. 37 

9 April 2012 monthly mean of zonally- averaged zonal wind analysis differences 

with ECMWF operational analysis from four different ADAS scenarios: con- 
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EnKF-based hybrid 3DVar (bottom left); and EnKF ensemble mean (bottom 
right). 38 

10 Twenty-four hour forecast RMS error, with respect to self-analysis, of regionally- 

averaged zonal winds for the three experiments under consideration: con- 
trol (blue), EnKF-based hybrid (red), and Elter-free hybrid (green); Northern 
Hemisphere (left), tropics (center), and Southern Hemisphere (right). 39 
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11 Anomaly correlation of the 500 hPa height of 5-day forecasts (top) verified 
with respect to own analysis, and shown for Northern (left) and Southern 
(right) Hemispheres for the three experiments under consideration: the con- 
trol (blue), EnKF-based hybrid (red), and filter- free hybrid (green). Signifi- 
cance plots appear beneath anomaly correlations with significance boxes color 
according to experiment designation; results are statistically significant when 
curve appear outside, and above, corresponding box. 40 
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Fig. 1. Illustration of contribution from each step taking place after the EnKF ensemble 
of analyses are generated. The panels show 500 hPa temperature: analysis increment for a 
given ensemble member (top left); effect of re-centering this given member about the central 
GSI analysis (top right); effect of applying additive inflation to the member analysis with a 
coefficient of 0.25 (bottom left); and resulting increment after both re-centering and additive 
inflation are applied (bottom right). 
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Fig. 2. Schematic of AU as implemented in GEOS hybrid ensemble-variational atmospheric 
data assimilation system. 
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Fig. 3. Zonal mean analysis increment, in total wet energy (J/kg) norm, using a standard 
3DVar (left), a 3DVar when the background error covariances are fully determined by the 
ensemble (center), and a hybrid 3DVar when the covariances are a 50% weighted sum of the 
static- and ensemble-derived background error covariances (right). 
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Fig. 4. The panel on the left shows the total cost function as it changes during the iterations 
of the GSI minimization; all cases are calculated for the same synoptic time but GSI is 
configured as follows: static (non- hybrid) 3DVar without balance constraint (black curve); 
(non- hybrid) 3DVar with TLNMC balance constraint (red curve); hybrid 3DVar without 
balance constraint applied to hybrid part of increment (green curve); and hybrid 3DVar 
with balance constraint applied to full increment (blue curve). The panel on the right shows 
the integrated mass-wind divergence spectra of the analysis increment as a function of wave 
number for the same four configurations; color scheme of curves is as in panel on the left. 
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Fig. 5. Global spread of a 32- member ensemble measured in total energy units (J/kg); 
when EnKF is used to generate ensemble (top), and when filter-free ensemble scheme is 
used instead (bottom). The curves are for: analysis spread before re-centering and inflation 
(blue); 3-, 6- and 9- hour backgrounds (green, red, and black respectively). Totals exclude 
levels roughly above 10 hPa. 
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Fig. 6. Regionally-averaged, monthly mean of radiosonde OMB residuals of zonal wind 
(top) and temperature (bottom) for three experiments: control (blue), EnKF-based hybrid 
(red), and filter-free hybrid (green), shown for: Northern Hemisphere (left column), tropics 
(center column), and Southern Hemisphere (right column). 
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Fig. 7. Observation impact on the analysis for three 3DVar experiments: control, non- 
hybrid (black bars); hybrid using EnKF (cyan bars); and hybrid using simplified, filter- free 
approach (magenta bars). In addition to the observation types shown, all experiments use 
GPS radio occupation, but results are not shown here due to a little glitch in the output 
hies saving their corresponding information (basically, GPS impacts are of the magnitude of 
those of radiosondes, and are comparable among the difference analysis approaches). 
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Fig. 8. Similar to Fig. 6, but for standard deviation. Only zonal winds are shown since 
temperature have neutral results. 
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Fig. 9. April 2012 monthly mean of zonally-averaged zonal wind analysis differences with 
ECMWF operational analysis from fonr different ADAS scenarios: control, traditional 3DVar 
(top left); filter- free-based hybrid 3Dvar (top right); EnKF-based hybrid 3DVar (bottom 
left); and EnKF ensemble mean (bottom right). 
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Fig. 10. Twenty-four hour forecast RMS error, with respect to self-analysis, of regionally- 
averaged zonal winds for the three experiments under consideration: control (blue), EnKF- 
based hybrid (red), and filter- free hybrid (green); Northern Hemisphere (left), tropics (cen- 
ter), and Southern Hemisphere (right). 
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Fig. 11. Anomaly correlation of the 500 hPa height of 5-day forecasts (top) verified with 
respect to own analysis, and shown for Northern (left) and Southern (right) Hemispheres for 
the three experiments under consideration: the control (blue), EnKF-based hybrid (red), and 
filter- free hybrid (green). Significance plots appear beneath anomaly correlations with sig- 
nificance boxes color according to experiment designation; results are statistically significant 
when curve appear outside, and above, corresponding box. 
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